Utility–Privacy Trade-Offs with Limited Leakage for Encoder

The utilization of databases such as IoT has progressed, and understanding how to protect the privacy of data is an important issue. As pioneering work, in 1983, Yamamoto assumed the source (database), which consists of public information and private information, and found theoretical limits (first-order rate analysis) among the coding rate, utility and privacy for the decoder in two special cases. In this paper, we consider a more general case based on the work by Shinohara and Yagi in 2022. Introducing a measure of privacy for the encoder, we investigate the following two problems: The first problem is the first-order rate analysis among the coding rate, utility, privacy for the decoder, and privacy for the encoder, in which utility is measured by the expected distortion or the excess-distortion probability. The second task is establishing the strong converse theorem for utility–privacy trade-offs, in which utility is measured by the excess-distortion probability. These results may lead to a more refined analysis such as the second-order rate analysis.


Background
The utilization of database has progressed in our society and includes autonomous cars and the congestion data service over the Internet. At the same time, the risk of accidental or intentional leakage of private information has also increased rapidly. To protect private information, coding with a privacy constraint has been analyzed via an information-theoretic approach. In 1983, Yamamoto [1] introduced a framework to quantify the utility of databases and the privacy of personal information and analyzed the trade-offs between them. Decades later, in 2013, Sankar et al. [2] claimed the necessity of converting databases to protect privacy while maintaining the utility of data. Then, Yamamoto's framework [1] was re-recognized by Sankar et al. and other researchers. Using the ratedistortion theory in information theory, he revealed the optimal relationships (theoretical limits) among coding rate, utility, and privacy in two cases; (i) public information that can be open to the public and private information that should be protected from a third party are encoded, and (ii) only public information is encoded. However, since a more general case, i.e., where (iii) public information and a part of private information is encoded, had not been clarified, Shinohara and Yagi [3] derived the theoretical limits in such a case (see Figure 1). As a result, our characterization of the achievable region gives a "unified expression" because it includes the characteristics given in [1] in cases (i) and (ii) as special cases.
X n H = (X H,1 , X H,2 , . . . , X H,n ) ∈ X n H (5) are referred to as the revealed source sequence and the hidden source sequence, respectively. In the addressed coding system introduced in [22], the revealed symbols and a part of the hidden symbols are input to the encoder, and thus the encoded alphabet E satisfies R ⊆ E ⊆ K. Similar to (3), X n K is sometimes described as where X n E is the source sequence observed by the encoder and E c = K \ E .

Encoder and Decoder
The coding system consists of encoder f n and decoder g n as in Figure 1. When the source sequence X n K = (X n E , X n E c ) is generated from the stationary and memoryless source p X K , the codeword J n = f n (X n E ) is generated by the encoder f n : X n E → {1, 2, . . . , M n } (7) and the reproduced sequenceX n R = g n (J n ) is produced by decoder g n : {1, 2, . . . , M n } →X n R , where M n denotes the number of codewords.

Performance Measures
In this section, we mention the measure of the coding rate, utility, privacy for the decoder, and privacy for the encoder. Hereafter, let a pair of the encoder and decoder ( f n , g n ) be fixed.
For a given M n , the coding rate is defined as Let d : X R ×X R → [0, ∞) be a distortion function between x R ∈ X R andx R ∈X R . The distortion between sequences x n R ∈ X n R andx n R ∈X n R is defined as Then, the measure of utility is defined as where E represents the expectation by the joint distribution of (X n R ,X n R ). In this system, the privacy of the hidden source sequence X n H should be protected when the codeword J n is observed by decoder g n . The measure of privacy for the decoder is defined as l n =: 1 n I(X n H ; J n ), (12) where I(X n H ; J n ) is the mutual information between X n H and J n . The privacy of the hidden source sequence X n H should be protected when the encoded information X E is observed by encoder f n . The measurement of privacy for the encoder is defined as e n := 1 n I(X n H ; X n E ), (13) where I(X n H ; X n E ) is the mutual information between X n H and X n E .

Achievable Region and Theorem
We define the achievable region for the first-order rate analysis with the expected distortion and state the obtained results.

Definition 1.
A tuple (R, D, L, E) is said to be -achievable (with respect to the expected distortion measure) if, for any given > 0, there exists a sequence of codes ( f n , g n ) satisfying r n ≤ R + , (14) u n ≤ D + , (15) l n ≤ L + , (16) e n ≤ E + (17) for all sufficiently large n.
The technical meanings of each constraint in Definition 1 can be interpreted as follows: Equation (14) evaluates how much the source sequence is compressed, so this rate should be decreased. Equation (15) is the constraint corresponding to distortion being less than D + . The smaller the distortion is, the better the utility is, so this condition should also be decreased. Equation (16) constrains the amount of leaked private information to the decoder. Since private information should be kept secret for the receiver, this quantity should be decreased as well. Equation (17) constrains the amount of private information leaked to the encoder. For the same reason as (16), this quantity should also be decreased. Remark 1. The minimum coding rate R for a fixed D corresponds to the rate-distortion function (Section 10 in [27]). Thus, in the proof of achievability, we evaluate the coding rate and the distortion with the argument in rate-distortion theory. This view is also important to correctly understand the numerical results in Section 6.1.

Definition 2.
The closure of the set of -achievable tuples (R, D, L, E) is referred to as theachievable region and is denoted by C E ( |P X K ) and defines C E (P X K ) := 0< <1 C E ( |P X K ). (18) To characterize the achievable region, we define the following informational region.

Definition 3.
For any E such that R ⊆ E ⊆ K, S E (P X K ) is defined as We establish the next theorem. For the proof of this theorem, please refer to Sections 3.3-3.5.

Theorem 1.
For any E such that R ⊆ E ⊆ K, the achievable region of the coding system is given by To clarify the relationship with the conventional result of Shinohara and Yagi [3], we mention the achievable region among the coding rate, utility, and privacy, which is derived

Proof Preliminaries for First-Order Rate Analysis
For preliminaries for coding theorems by the first-order rate analysis, we define strongly typical sequences that are necessary for the proof and show some properties. These proof preliminaries are also used in Section 4. Definition 6 (Definition 2.1, [28]). The type of a sequence x n ∈ X n of length n is the distribution P x n on X defined by P x n (a) := 1 n N(a|x n ), (25) where N(a|x n ) represents the number of occurrences of symbol a ∈ X in x n . Likewise, the joint type of x n ∈ X n and y n ∈ Y n is the distribution P x n y n on X × Y defined by P x n y n := 1 n N(a, b|x n , y n ), (26) where N(a, b|x n , y n ) represents the number of the occurrences of (a, b) ∈ X × Y in the pair of sequences (x n , y n ).
Definition 7 ((Conditional Type), [28], Definition 2.2). We define the conditional type of y n given x n as a stochastic matrix V : X → Y satisfying N(a, b|x n , y n ) = N(a|x n )V(b|a). (27) In particular, the conditional type of y n given x n is uniquely determined and given by V(b|a) = N(a, b|x n , y n ) N(a|x n ) (28) if N(a|x n ) > 0 for any a ∈ X .
Definition 8 ((Strongly Typical Sequences), [29], Definition 1.2.8). For any distribution P on X , a sequence x n ∈ X n is said to be P-typical with constant δ > 0 if 1 n N(a|x n ) − P(a) ≤ δ for every a ∈ X (29) and, in addition, no a ∈ X with P(a) = 0 occurs in x n . The set of such sequences is denoted by T n δ (P). If X is a random variable with values in X , we also refer to P-typical sequences as X-typical sequences and write T n δ (X).
Definition 9 ((Conditional Strongly Typical Sequences), [29], Definition 1.2.9). For a stochastic matrix W: X → Y, a sequence y n ∈ Y n is said to be W-typical given x n ∈ X n with constant for every a ∈ X , b ∈ Y, and, in addition, N(a, b|x n , y n ) = 0 whenever W(b|a) = 0. The set of such sequences y n is denoted by T n δ (W|x n ). Further, if X and Y are random variables with values in X and Y, respectively, and P Y|X = W, then they are also said to be Y|X-typical and written as T n δ (Y|X|x n ).
Hereafter, the set of conditional strongly typical sequences T n δ (Y|X|x n ) is abbreviated as T n δ (Y|x n ). We state some lemmas that are used in this proof.

Proof of Converse Part
In this part, we shall prove C E (P X K ) ⊆ S E (P X K ). Let a tuple (R, D, L, E) ∈ C E (P X K ) be arbitrarily fixed. Then, there exists an (n, 2 n(R+ ) , D + , L + , E + ) code that satisfies (14)- (17). Let Q be a uniform random variable over {1, 2, . . . , n} and let p i (x E ,i , x E c ,i ,x R,i ) be the conditional distribution given Q = i. Evaluating the inequalities for R, we obtain (a) follows from (14), (b) follows because H(J n ) ≤ log |J n | = log M n , (c) is due to the fact thatX n R = g(J n ), (d) follows because each X K,i is independent andX n R is a function of J n , (e) follows because conditioning reduces entropy, (f) is due to the definition of Q, (g) follows because X E ⊥ Q, and (h) follows because conditioning reduces entropy, where (X E , Similarly, evaluating D, L, and E, respectively, we obtain where (i) is due to (15), (16), (m) follows because i.i.d. P X n K , (n) follows becauseX n R = g(J n ), (o) follows from the fact that conditioning reduces entropy, (p) is derived from the definition of Q, and (q) follows because conditioning reduces entropy, where (X H , is due to chain rule for mutual information, (s), (t) follow because i.i.d. P X n K . It is readily shown that the Markov chain X E c -X E -X R holds (cf. Appendix A). We complete the proof of the converse part.

Proof of Direct Part
In this part, we provide a sketch of the proof of S E (P X K ) ⊆ C E (P X K ). Under an arbitrarily fixed distribution P X E ,X E c · PX R |X E , any tuple (R, D, L, E) ∈ S E (P X K ) is chosen such that From (42) and (43), we can choose a sufficiently small > 0 such that In addition, with this , some constant 0 < τ < 1 2 is fixed such that We can also choose positive numbers δ(:= δ(n)) such that Generation of codebook: Randomly generatex n R (j) from the strongly typical sequences T n δ (X R ) for j = 1, 2, . . . , M n := 2 nR . Reveal the codebook C = {x n R (1), . . . ,x n R (M n )} to the encoder and decoder.
Encoding: If a sequence x n E ∈ X n E satisfies x n K = (x n E , x n E c ) with some x n E c ∈ X n E c , we write x n E ≺ x n K . Given x n K , the encoder finds j such that x n E ∈ T n δ (X E |x R (j)) and sets f n (x n E ) = j where T n δ (X E |x R (j)) is the conditional strongly typical sequences. If there exist multiple such j, f n (x n E ) is set as the minimum one. If there are no such j, then f n (x n E ) = M n . Decoding: When j is observed, the decoder sets the reproduced sequence asX n R = x n R (j).
For sufficiently large n, we can show that there exists a code ( f n , g n ) such that (cf. Appendix C) For this code ( f n , g n ), we evaluate the privacy leakage against the decoder as where (a) follows because of i.i.d. P X n K , (b) is due to the inequality proved in Appendix D, (c) follows by removing the term for j = M n .
Here, for any x n H satisfying x n K = (x n R , x n H ) ∈Ã(j) with some x n R , we can show that where (d) follows from the fact that (e) is due to the inequality proved in Appendix E, and (f) follows because of the number of strongly typical sequences.
Therefore, from Equations (61), (64) and (66) we can obtain Since constants , τ, and δ are fixed to satisfy (45)-(48), from (44), (57)-(59) and (67), we obtain Therefore, for the fixed distribution P X E ,X E c · PX R |X E any tuple R |X E is fixed arbitrarily. We complete the proof of the direct part.

Performance Measures
Hereafter, let the pair of the encoder and decoder ( f n , g n ) be fixed. For a given M n , the coding rate is defined as Then, the measure of utility is defined as This measurement is called excess-distortion probability for D ≥ 0.
In this system, the privacy of the hidden source sequence X n H should be protected when the codeword J n is observed by decoder g n . The measure of privacy for the decoder is defined as where I(X n H ; J n ) is the mutual information between X n H and J n . The privacy of the hidden source sequence X n H should be protected when the encoded information X E is observed by encoder f n . The measurement of privacy for the encoder is defined as where I(X n H ; X n E ) is the mutual information between X n H and X n E .

Achievable Region and Theorem
We define the achievable region for the first-order rate analysis with the excessdistortion probability and state the obtained results.

Definition 10.
A tuple (R, D, L, E) is said to be -achievable (with respect to the excess-distortion probability) if, for any given > 0, there exists a sequence of codes ( f n , g n ) satisfying for all sufficiently large n.
The technical meanings of each constraint in Definition 10 can be interpreted as follows: Equation (78) evaluates how much the source sequence is compressed, so this rate should be decreased. Equation (79) is the constraint corresponding to the excess-distortion probability being less than , so this condition should also be decreased. Equation (80) constrains the amount of leaked private information to the decoder. Since private information should be kept secret for the receiver, this quantity should be decreased as well. Equation (81) constrains the amount of leaked private information to the encoder. For the same reason as (80), this quantity should also be decreased.

Definition 11.
The closure of the set of -achievable tuples (R, D, L, E) is referred to as theachievable region and is denoted by L E ( |P X K ) and define We establish the following theorem. For the proof of this theorem, please refer to Sections 4.3 and 4.4.

Theorem 2.
For any E such that R ⊆ E ⊆ K, the achievable region of the coding system is given by Remark 4. From Theorems 1 and 2, we find that the achievable region in which utility is measured by the expected distortion is equal to the one in which utility is measured by the excess-distortion probability.
Because in Section 6 we discuss the achievable region among coding rate, utility, and privacy, a characterization of the achievable region is derived by projecting the characterization in Theorem 2 onto the R-D-L hyperplane.

Definition 12.
For any E such that R ⊆ E ⊆ K, we define

Corollary 2.
For any E such that R ⊆ E ⊆ K, the region L RDL Examples of numerical calculation of this result are shown in Section 6.1.
Since we focus on the achievable region between utility and privacy in the next section, a characterization of the achievable region is derived by further projecting the result of Theorem 2 onto the D-L plane.

Definition 14.
For any E such that R ⊆ E ⊆ K, we define

Proof of Converse Part
From Section 3.4 (proof of the converse part), we have Let a tuple (R, D, L, E) ∈ L E (P X K ) be arbitrarily fixed and > 0 and > 0 be given. From the argument of the method of types, the sequences x n R are divided into two categories: distortion-typical or non-distortion-typical with somex n R . The sequences of the former categories satisfy 1 n d(x n R ,x n R ) < D + and the sequences of the latter one satisfy . Then, the expected distortion is bounded from above as where (a) follows from (79) of -achievable in which utility is measured by the excessdistortion probability. Since + d max can be arbitrarily small with proper choices of and , (15) can be derived. This means From both inclusion relations, is evidently satisfied.

Proof of the Direct Part
In this part, we provide a sketch of the proof of S E (P From (97) and (98), we can choose a sufficiently small > 0 such that In addition, with this , some constant 0 < τ < 1 2 is fixed such that We can also choose positive numbers δ(:= δ(n)) such that as n → ∞. Let δ(n) = c √ n log n where c is a constant, and obviously (104) and (105) are satisfied.
Generation of codebook: Randomly generatex n R (j) from the strongly typical sequences T n δ (X R ) for j = 1, 2, . . . , M n := 2 nR . Reveal the codebook C = {x n R (1), . . . ,x n R (M n )} to the encoder and decoder. Encoding is the conditional strongly typical sequences. If there exist multiple such j, f n (x n E ) is set as the minimum one. If there are no such j, then f n (x n E ) = M n . Decoding: When j is observed, the decoder sets the reproduced sequence asX n R = x n R (j).
For sufficiently large n, we can show that there exists a code ( f n , g n ) such that (cf. Appendix F) Pr For this code ( f n , g n ), we evaluate the privacy leakage against the decoder as where (a) follows because of i.i.d. P X n K , (b) is due to the inequality proved in Appendix D, and (c) follows by removing the term for j = M n .
Here, for any x n H satisfying x n K = (x n R , x n H ) ∈Ã(j) with some x n R , we can show that where (d) follows from the fact that Since constants , τ, and δ are fixed to satisfy (100)-(102), from (111), (113), and (122), we obtain Therefore, for the fixed distribution P X E ,X E c · PX R |X E , any tuple is achievable. Consequently, S * E (P X K ) ⊆ L E ( |P X K ). Taking the closure for the l.h.s., we obtain Cl(S * E (P R |X E is fixed arbitrarily. We complete the proof of the direct part.

Another Expression of the Achievable Region
In Section 5.1, we clarify that the achievable region L DL E (P X K ) defined in (89) coincides with the region expressed with a tangent plane.

Theorem 3.
For any E such that R ⊆ E ⊆ K, the region S DL E (P X K ) defined in (90) is given by and the achievable region L DL E (P X K ), which is the projection region of the achievable region L E (P X K ) onto the D-L plane, is given by Proof. Figure 3 illustrates the proof image using a graph. Let a constance µ ≥ 0 be fixed arbitrarily. Like in Figure 3, there exists a boundary point (D µ , L µ ) of S DL E tangent to the line with slope −µ. The intercept of this tangent line is L µ + µD µ . The minimum I(X H ;X R ) + µE[d(X R ,X R )] characterized by some distribution PX R |X E coincides with L µ + µD µ . Therefore, From (130), we obtain (131) Taking the intersection by µ ≥ 0 on the both sides of (131), The l.h.s. of (131) shows the upper-right region in the first quadrant drawn by the tangent line with a slope −µ for S DL E (P X K ). Since the l.h.s. of (132) is the intersection of the l.h.s. of (131), the l.h.s. of (132) represents S DL E (P X K ). From Definition 16, the right-hand side (r.h.s.) of (132) is T DL E (P X K ). As a result, (128) holds. Since L DL E (P X K ) = S DL E (P X K ) from Corollary 3, likewise, (129) holds.

Proof Preliminaries
In Section 5.2, we derive two fundamental properties of the minimization about two values and the inequalities about entropy and divergence to prove the strong converse theorem. In Proposition 1, we change the objective function T µ E (P X K ) of the region expressed with the tangent plane introduced in Section 5.1 onto the region expressed with divergence. Proposition 1. Let µ ≤ 0 be fixed arbitrarily. For any E such that R ⊆ E ⊆ K, where T µ,α and Q X E c X EXR is the distribution induced from each PX be the distribution that minimizes the r.h.s. of (134) and Q α is non-negative and is bounded above, by setting a = log |X H | + D max , it must hold that αD(P α Notice that any set of probability distributions on a finite alphabet forms a compact set. Because G(P α ) is a continuous function over a compact set, it is also uniformly continuous. Then, there exists a function ∆(t) satisfying ∆(t) → 0 as t → 0 such that Consequently, we obtain the desired inequality T µ where (a) follows from the memoryless property of i.i.d. source P X n The third term can be bounded from below as where (c) follows from the data processing inequality and (d) holds because of Jensen's inequality.

),
The sum of the first and second terms satisfies where (f) follows from the log sum inequality.

Strong Converse Theorem
We shall establish the strong converse theorem, which is the main result of this section. Before proving the theorem, we state the lemma of the key tool in the proof about a single-letterized T µ,α E (P X K ) and a T µ,α E (P n X K ), which are introduced in Proposition 1.

Lemma 7.
For any E such that R ⊆ E ⊆ K, all n ∈ N, µ ≥ 0 and α > 0, it holds that As the main theorem of this section, we show the strong converse theorem for the utility-privacy trade-offs.

Theorem 4. Strong converse theorem:
For any E such that R ⊆ E ⊆ K and all 0 < < 1, it holds that Remark 5. Theorem 4 suggests that regardless of the value of , the region L DL E ( |P X K ) is equal to L DL E (P X K ).

Proof of Lemma 7
Lemma 7 indicates that the function T µ,α E (P n X K ), whose argument P n X K is a probability distribution over X n K , can be lower-bounded by the n-fold of a single-letterized function T µ,α E (P X K ). Before describing the detailed proof, we state the outline of the proof: (i) We first express the function T µ,α E (P n X K ) as the maximum of the difference of two functions denoted by G 1 and G 2 as in (142). (ii) Then, we show that the first function G 1 can be lower-bounded by the n-fold of its single-letterized function as in (143), while the second function G 2 can be upper-bounded by the n-fold of its single-letterized function as in (147). This outline of the proof is similar to the Proof of Theorem 4, 16 with a slight modification of the function G 2 .
For a given distribution PX n E cX n EX n R , let functions G 1 (PXn Using these functions, and in view of (134), T µ,α For fixed PX n E cX n EX n R , from Proposition 2, it holds that The second term of (141) can be expressed as follows: where (a) follows from 1 n ∑ n j=1 PX Moreover, for the third term of (141), it holds that From (144)-(146), we obtain Consequently, since (143) and (147) are satisfied for an arbitrary PX n E cX n EX n R , the proof is completed.

Proof of Strong Converse Theorem
For any given > 0, fix the rate pair (D, L) ∈ L DL E ( |P X K ) arbitrarily. Then, by definition, there exists a code ( f n , g n ) satisfying (79) and (80). For this code ( f n , g n ), a set D is defined as We derive a distribution PXn It is obvious that the excess-distortion probability measured by PXn Thus, by imitating the proof approach of the standard weak converse theorem, it holds that From (148), the following equation is obtained: where (a) follows from (149) and I(X n E c ;X n R |X n E ) = 0, (b) is due to (134).
from Proposition 1, it holds that for an arbitrary α > 0, Hence, it holds that For the set of (D, L) satisfying (150), varying µ ≥ 0 arbitrarily and taking the intersection, we have From Theorem 3, the r.h.s. of (151) is equal to L DL E (P X K ). This proof is completed.

Numerical Calculation of Coding Rate, Utility, and Privacy for Decoder
In this section, we show some numerical calculations of the achievable region C RDL E (P X K ) and L RDL E (P X K ) in Corollaries 1 and 2, respectively. In general, it is difficult to compute the achievable region C RDL E (P X K ) and L RDL E (P X K ). Nevertheless, to obtain some insight, let us consider the three tractable but essential cases. In these calculations, the number of public attributes is one (|R| = 1) and the number of private attributes is two (|H| = 2). We assume that each of the attributes is binary. Here, note again that the coding rate R acts like the rate-distortion function in rate-distortion theory (cf. (Section 10 in [27])). For fixed D and L, a smaller coding rate is better.
In the first example, we calculated the L-D graph of theoretical limits in case (i) E = K, case (ii) E = R, and case (iii) R ⊂ E ⊂ K (Figure 4). As a result, the achievable privacy leakage L becomes small as D becomes large if we do not impose any restrictions on the value of R. For a given D, the privacy leakage for the decoder in case (i) E = K is the smallest, and the one in case (ii) E = R is the largest in all cases. The second example calculated the R-D graph of theoretical limits in cases (i), (ii), and (iii) ( Figure 5). We can see that the minimum coding rates for a given D coincide in all cases if we do not impose any restrictions on the value of L. In the third example, we calculated the optimal privacy leakage L for fixed D and the corresponding coding rates R in cases (i), (ii), and (iii) (Tables 1-3). As a result, the optimal privacy leakage in cases (i) and (iii) is smaller than the one in case (ii), whereas for the optimal privacy leakage, the achievable coding rates in cases (i) and (iii) is larger than the one in case (ii).   Next, we discuss these results. In Figure 4, in comparison with each case, we can verify that for a given D, the more private information is encoded, the smaller the achievable minimum privacy leakage is. Figure 5 suggests that if the coding rate should be minimized, it suffices to encode only the public attributes. This result is evident from Corollaries 1 and 2 because the condition on the choice of test channel PX R |X E in case (i) is weaker than the one in case (ii), and if an appropriate test channel is taken in case (i), it is also appropriate in case (ii). It is indicated that the achievable region in case (ii) is also the one in cases (i) and (iii). The opposite is not the case. From Tables 1-3, we can confirm the trade-off between the optimal privacy leakage L for a fixed D and the corresponding coding rate R in comparison with each case.
Summarizing the foregoing arguments, we have discussed the relationship between utility and privacy in Figure 4, the one between utility and coding rate in Figure 5, and the one between privacy and coding rate in Tables 1-3. From the discussion about Figure 5, some readers may suspect that case (i) is the best-encoded information because the achievable region in cases (ii) and (iii) is the one in case (i). This is true if we do not consider the leakage for the encoder. However, this is not true if we consider the leakage for the encoder, that is, the measurement of privacy for the encoder (see (12) or (76)). In the next section, we discuss this point in detail.

Significance of Limited Leakage for Encoder
In this section, we discuss the significance of evaluating the leakage for the encoder. Our goal of this discussion is to show that the best-encoded information may be case (iii) R ⊂ E ⊂ K if we take the limited leakage for the encoder into consideration.
The first issue is the amount of encoded information. Some readers may think that it is better that more encoded information is inputted into the encoder. However, there are pros and cons.

Pros:
The achievable regions C RDL E (P X K ) and L RDL E (P X K ) become larger. Cons: The leakage for the encoder increases.
From this point of view, we can come up with the idea that there exists the bestencoded information in case (iii) R ⊂ E ⊂ K if we impose some constraint on the leakage for the encoder. This idea is the key point of this paper.
The second issue is the significance of the limited leakage for the encoder. Figure 6 shows the Hasse diagram, which represents the inclusion relation about the index sets of attributes. The Hasse diagram is often used to represent inclusion relations, for example, R ⊂ E 2 ⊂ E 1 ⊂ K.
We can also regard Figure 6 as the Hasse diagram that represents the inclusion relation for the achievable regions C RDL E (P X K ) and L RDL E (P X K ) because the index sets of attributes (R ⊆ E ⊆ K) corresponds to the encoded information (X E ) and the encoded information corresponds to the achievable region (C RDL E (P X K ) and L RDL E (P X K )). In addition, the diagram in Figure 6 has another property, which is that the superordinate sets have a larger amount of privacy leakage for the encoder than the subordinate sets since the index sets of attributes correspond to the privacy leakage for the encoder. Let us consider a practical application. We assume that the data aggregator, that is, the encoder, tries to gather encoded information from some application user and hopes to develop the utility of the application while limiting the amount of leakage for X n H by E ≥ 0, that is, e n ≤ E. More precisely, for a given E, we want to find which subsets of K are sufficient to characterize where C E (P X K ) and L E (P X K ) are defined in Definitions 2 and 11, respectively. The process is as follows.
Step 1: Check the user's requirements and impose the restriction on the privacy leakage for the encoder.  Step 2: Check the inclusion relation between index sets. Figure 8 shows the Hasse diagram for Step 2. From Figure 6, we can find that Therefore, the index sets R, E 3 , and E 5 are not suitable as the index sets of encoded information. Figure 9 shows the Hasse diagram obtained after Step 2. From Figure 9, the remaining index sets are E 2 and E 4 . Therefore, if we impose restriction on privacy leakage for the encoder, the index sets E 2 or E 4 form the Pareto area in this multi-objective optimization problem. In other words, there exists a system that satisfies the user's requirements E of the maximum amount of leakage to the encoder, and the achievable regions are given by C RDL (P X K |E) = C RDL E 2 (P X K ) ∪ C RDL E 4 (P X K ) and L RDL (P X K |E) = L RDL E 2 (P X K ) ∪ L RDL E 4 (P X K ). From the discussion above, we mention that the best-encoded information is case (iii) R ⊂ E ⊂ K if we take the limited leakage for the encoder into account. This concept is one of the most important novelties in this paper.
If E satisfies some condition, then C RDL (P X K |E) can be characterized by the expressions given by Yamamoto [1] (cf. Remark 3). More specifically, the region C RDL (P X K |E) can be given by where the regions S RDL K (P X K ) and S RDL R (P X K ) are given in [1] (cf. Remark 3).

Discussion on Measures for Privacy Leakage
This paper adopts the mutual information as the measure of privacy leakage as in (12), (13), (76), and (77). However, some less likely data can be leaked even though the database satisfies the theoretical limit of privacy leakage. For example, let (X, Y) be a pair of correlated random variables whose I(X; Y) is very small. However, there may exist a pair of (x 1 , y 1 ) such that Y = y 1 can imply X = x 1 with high probability. To put it differently, the receiver can tell the value of X if it observes Y = y 1 . The theoretical limit evaluated with mutual information cannot prevent such a scenario. To circumvent this scenario, we suggest the other measurement adopted in related studies. A promising candidate to avoid this problem is to employ Rényi information of higher orders [30], maximal leakage [15], and maximal α-leakage [16][17][18]21].

Conclusions
In this paper, we strengthened the results in [3] mainly by establishing three coding theorems in a privacy-constrained source coding problem. In Sections 3 and 4, two theorems are made about the first-order rate analysis in which utility is measured by the expected distortion or the excess-distortion probability for case (iii), R ⊂ E ⊂ K. The novelty is the introduction of the measure of privacy for the encoder along with the use of the excess-distortion probability. The obtained characterization reduces to the one given in [3] derived based on the expected distortion when the leakage for the encoder is not limited, and the result shows that employing an excess-distortion probability does not change the achievable region from the one with an expected distortion. In Section 5, we establish the strong converse theorem for utility-privacy trade-offs. Although the described result is for the projected plane of utility and privacy for the decoder for simplicity, we can also incorporate the measure of privacy for the encoder. Finally, we discuss the significance of the encoded information considering limited leakage for the encoder. The argument suggests that the best-encoded information can be case (iii) R ⊂ E ⊂ K if some constraint is imposed on the privacy leakage for the encoder.
As future work, the second-order rate analysis for utility-privacy trade-offs is an interesting research topic [4][5][6]. Moreover, the strong converse theorem and the second-order rate analysis for the four-dimensional region of coding rate, utility, privacy for the decoder, and privacy for the encoder are more challenging tasks. It is also worth analyzing the achievable region with the other privacy measures such as Rényi information [30], maximal leakage [15], and maximal α-leakage [16][17][18]21]. This paper analyzed the theoretical limits of coding, but understanding how to achieve the theoretical limits remains open. The construction of good codes is also an important subject. Extensions of this paper's scenario to coding with side information [2,25] are also of interest.
where (a) is due to the Markov chain X n E c -X n E -X R,i and (b) follows from the stationary memoryless source. Therefore, we can obtain the Markov chain X E c ,i -X E ,i -X R,i . For the marginal distribution, we can show that where (c) follows because (d) is due to the Markov chain X E c ,i -X E ,i -X R,i , (e) follows from the stationary memoryless source, and (f) follows because Therefore, we can obtain the Markov chain X E c -X E -X R . We complete the proof. (56) FromÃ(j) ⊆ B(j) for j = 1, 2, . . . , M n − 1,

Appendix B. Proof of Equation
, and thus we have x n E c / ∈ T n δ (X E c |x n E ,x n R (j)) from Lemma 5. Then, We can prove that where (a) is due to the Markov chain X n E c -X n E -X n R and (b) follows from Lemma 6.
From Equations (A5) and (A7), we can obtain We complete the proof of (56).

Appendix C. Proof of Existence of Code Satisfying Equations (57)-(62)
We first set M n := 2 nR and r n := 1 n log M n . Then, we obviously have (57). From the union upper bound, From Lemma 6, the first term in (A9) is bounded as We consider the expectation of the second term in (A9) by random coding. Hereafter, we denote the random variable corresponding to the reproduced sequencex n R (j) asX n R (j). For notational simplicity, we use the abbreviation and then where (a) is owing to (A11), (b) is due to the symmetry about indexes of random coding, (c) follows from the same way as in (Section 3.6.3 in [31]), and (d) because δ is fixed to satisfy (49).
From (A10) and (A12), we obtain Therefore, there exists at least one codebook satisfying (60) in the ensembles obtained by random coding.
Hereafter, codebook C is fixed to satisfy (60). That is, codebook C satisfies We evaluate the distortion function for each j.
(i) j = 1, 2, . . . , M n − 1: where (e) because from Lemma 4, if x n E ∈ T n δ (X E |x n R (j)), then x n R ∈ T n δ 1 (X R |x n R (j)) and from Lemma 3, ifx n R (j) ∈ T n δ (X R ) and x n R ∈ T n δ 1 (X R |x n R (j)), then (x n R ,x n R (j)) ∈ T n δ+δ 1 (X R ,X R ). (ii) j = M n : where (f) is due to the definition of D max := max We consider Pr{J n = M n }. From (A14), Therefore, we can confirm From (i) and (ii), we can evaluate utility u n as below.
for all sufficiently large n, where (g) follows from (A18).
Thus, we obtain (58). We can evaluate the privacy leakage against the encoder as shown below.
where (h) is due to chain rule for mutual information and (i), (j) follows because i.i.d. P X n K . Thus, we have (59).
Next, we show that the probability that random vector X n K is not included in the set M n −1 j=1Ã (j) is sufficiently small. First, notice that where j 0 is the index such that f n (x n E ) = j 0 for 1 ≤ j 0 ≤ M n − 1. Therefore, by the union upper bound, We evaluate each term in (A22).
(i) The first term: where (k) is because of (A14).
(ii) The second term: If the event in the second term occurs, x n Therefore, from Lemma 5, x n E c / ∈ T n δ (X E c |x n E ,x n R (j 0 )) holds. Hence, where (l) is due to the Markov chain X n E c − X n E −X n R , (m) follows since x n E ∈ T n δ (X E |x n R (j 0 )) and Lemma 6, and (n) follows because A(j) are disjoint for each j. From (A22)-(A24), Therefore, for sufficiently large n, and we obtain (61). From Lemma 1, for sufficiently large n to stochastic matrix W :X R → X K and x n R (j) ∈ T n δ (X R ) we can show that 1 n log |T n δ 2 (X K |x n R (j))| − H(X K |X R ) ≤ τ, We can also show from (A27) that 2 n{H(X K |X R )−τ} ≤ |T n δ 2 (X K |x n R (j))| ≤ 2 n{H(X K |X R )+τ} .
(i) The first term: Next, the variational distance between distributionsP n andQ n is (ii) The second term: Therefore, we obtain We consider the expectation of the second term in (A57) by random coding. Hereafter, we denote the random variable corresponding to the reproduced sequencex n R (j) asX n R (j). For notational simplicity, we use the abbreviation Pr{X n E / ∈ T n δ (X E |X n R (j)) for all j = 1, 2, . . . , M n − 1|X n E = x n E } = Pr{x n E / ∈ T n δ (X E |X n R (j)) for all j = 1, 2, . . . , M n − 1}, and then E[Pr{X n E ∈ T n δ (X E ), X n E / ∈ T n δ (X E |X n R (j)) for all j = 1, 2, . . . , M n − 1}] p(x n E ) E Pr{x n E / ∈ T n δ (X E |X n R (1))} Therefore, there exists at least one codebook satisfying (112) in the ensembles obtained using random coding.
Hereafter, codebook C is fixed to satisfy (112). That is, codebook C satisfies For a fixed codebook C, we divide the sequences x n E ∈ X n E into three categories. • Strongly typical sequences x n E ∈ T n δ (X E ) such that there exists a codewordX n R (j o ) for some j o = 1, 2, . . . , M n − 1 that is conditionally strongly typical with x n E . In this case, from Lemma 3, (x E ,x n R (j o )) ∈ T n 2δ (X E ,X R (j o )) . Since the codeword is jointly strongly typical with x n E , the continuity of the distortion as a function of the joint distribution ensures that they are also typical distortions (see [2], Chapters 10.5 and 10.6). Hence, the distortion between these x n E and their codewords is bounded by D + δ where δ goes to 0 as n → ∞. In the first-order analysis, that is, n → ∞, we can regard D + δ as D.
• Strongly typical sequences x n E ∈ T n δ (X E ) such that f n (x n E ) = M n . • Non-strongly typical sequences x n E / ∈ T n δ (X E ). The sequences in the second and third categories are encoded as f n (x n E ) = M n . The sequences of third categories are the sequences that can be bounded by such the distortion d max as in excess of D. Then, the excess-distortion probability is evaluated as Hence, for an appropriate choice of and n, we can ensure the excess-distortion probability of all badly represented sequences are as small as we want. We obtain (113). We can evaluate privacy leakage against the encoder as below.
where (e) is due to chain rule for mutual information and (f), (g) follows because i.i.d. P X n K . Thus, we have (114).
Next, we show that the probability that random vector X n K is not included in the set M n −1 j=1Ã (j) and is sufficiently small. First, notice that where j 0 is the index such that f n (x n E ) = j 0 for 1 ≤ j 0 ≤ M n − 1. Therefore, by the union's upper bound, Pr   Therefore, for sufficiently large n, and we obtain (115). From Lemma 1, for sufficiently large n to stochastic matrix W :X R → X K and x n R (j) ∈ T n δ (X R ) we can show that 1 n log |T n δ 2 (X K |x n R (j))| − H(X K |X R ) ≤ τ, (A73) We can also show from (A73) that 2 n{H(X K |X R )−τ} ≤ |T n δ 2 (X K |x n R (j))| ≤ 2 n{H(X K |X R )+τ} .